The Maximum Principle for Global Solutions of 
Stochastic Stackelberg Differential Games* 

Alain Bensoussan^, Shaokuan Chen^ and Suresh P. Sethi^ 

October 30, 2012 

Abstract: This paper obtains the maximum principle for both stochastic (global) open- 
loop and stochastic (global) closed-loop Stackelberg differential games. For the closed-loop 
case, we use the theory of controlled forward-backward stochastic differential equations 
to derive the maximum principle for the leader's optimal strategy. In the special case of 
the open-loop linear quadratic Stackelberg game, we consider the follower's Hamiltonian 
system as the leader's state equation, derive the related stochastic Riccati equation, and 
show the existence and uniqueness of the solution to the Riccati equation under appropri- 
ate assumptions. However, for the closed-loop linear quadratic Stackelberg game, we can 
write the related Riccati equation consisting of forward-backward stochastic differential 
equations, while leaving the existence of its solution as an open problem. 
Keywords: Stackelberg differential game, maximum principle, forward-backward stochas- 
tic differential equation, Riccati equation. 

1 Introduction 

In 1934, H. von Stackelberg introduced a concept of a hierarchical solution for markets 
where some firms have power of domination over others [2S]- This solution concept is now 
known as the Stackelberg equilibrium or the Stackelberg solution which, in the context of 
two-person nonzero-sum static games, involves players with asymmetric roles, one leading 
(called the leader) and the other following (called the follower). A Stackelberg game 
proceeds with the leader announcing his policy prior to the start of the game. With the 
knowledge of the leader's strategy, the follower chooses a policy so as to optimize his 
own performance index. The leader, anticipating the follower's optimal response, picks 
the policy which optimizes his performance index on the rational reaction curve of the 
follower, which together with the corresponding policy of the follower is known as the 
Stackelberg solution. 
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In dynamic Stackelberg games, it becomes important to know the player's information 
sets at any given time. In this paper, we will consider two different information structures: 
i) open-loop for both players and ii) closed-loop perfect state (CLPS) for both players. 
Moreover, we will only treat global solution where the leader announces his entire strategy 
at the start of the game and the follower reacts to the entire strategy. The solutions of 
games with the first information structure will be termed (global) open-loop Stackelberg 
solutions, whereas the solutions of the games with the second information structure will 
be termed (global) closed-loop Stackelberg solutions. It is known that both these solu- 
tions suffer from time inconsistency, which results from the functional dependence of the 
follower's optimal response strategy on the leader's entire strategy on the duration of the 
game. 

In addition to these concepts, there is another concept of feedback Stackelberg solution, 
where the Stackelberg property is retained at every stage (in the discrete-time setting) 
with the leader having only stagewise advantage over the follower. Since the continuous- 
time problem can be viewed as the number of stages becomes unbounded in any finite 
interval, stagewise advantage of the leader over the follower turns into instantaneous 
advantage. A good aspect of this solution is that it is time consistent. Readers interested 
in the theory and applications of this solution can refer to [2J, [7], [TO], [12], [33] and [H]. 

In an open-loop or closed-loop Stackelberg differential game, the follower aims at 
minimizing his cost functional in accordance with the leader's strategy on the whole 
duration of the game. Anticipating the follower's optimal response depending on his 
entire strategy, the leader chooses an optimal one in advance to minimize his own cost 
functional, based on the Hamiltonian system satisfied by the follower's optimal response. 
The difference between the two kinds of games is whether the information sets of the 
players involve the history of the state. The introduction of the history of the state in the 
closed-loop Stackelberg game, even in the deterministic case, makes it difficult to tackle, 
as the follower may not obtain his optimal response if the leader's announced strategy 
incorporates the memory of the state. Two approaches to circumvent this difficulty are 
introduced: the team approach and the maximum principle. For the former, one can refer 
to PP, [6] in the discrete-time setting and [19], [21], [22] and [I] in the continuous-time 
setting. For the latter, one can refer to [20] for nonclassical control problems arising from 
Stackelberg games. The idea of team approach is as follows: the leader first minimizes his 
cost functional over the controls of both the leader and the follower, yielding a lower bound 
on his cost functional and the team strategies for both players. Then the leader makes an 
effort to find a closed-loop strategy such that the follower's optimal response and the state 
trajectory will coincide with his team strategy and the team optimal trajectory, which 
leads to the lower bound on the leader's cost functional. The maximum principle approach 
restricts the leader's strategy to depend only on the initial state and the current state 
(memoryless perfect state information structure) and a nonclassical control problem faced 
by the leader is solved. It is worth noting that in this case, the follower's adjoint equation 
involves the derivative of the leader's strategy with respect to the state. Therefore, after 
incorporating the follower's adjoint variable as an augmented state, the leader encounters 
a nonclassical control problem with the feature that both the control and its derivative 
with respect to the state appear in the controlled forward-backward ordinary differential 
equation system. The authors provide two approaches to tackle this problem and give the 
necessary conditions satisfied by the leader's optimal strategy. One is to directly apply 
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the variational technique to the state system with mixed-boundary conditions (the adjoint 
equation of the follower with a terminal condition). The other is to establish an equivalent 
relationship between such a nonclassical control problem and a classical control problem, 
which yields that the optimal strategy could be found in the space of affine functions. 
The phenomenon of time inconsistency is also analyzed by the authors. We will elaborate 
on the technical details and generalize their result to the stochastic setting in section HI 

For the stochastic formulation of Stackelberg games involving white noise terms, Yong 
[30] studies the open-loop linear quadratic case, with control variables appearing in diffu- 
sion term of the state. To give a state feedback representation of the open-loop Stackelberg 
solution (in a non-anticipating way), the related Riccati equation is derived and sufficient 
conditions for the existence of its solution with deterministic coefficients are discussed. 
More recently, 0ksendal et al [H] have considered a general stochastic open-loop Stack- 
elberg differential game, proved a sufficient maximum principle, and applied the theory 
to continuous-time newsvendor problems. 

In this paper, we study stochastic global Stackelberg differential games with open- 
loop and closed-loop information structures. As we shall see, the problems confronted by 
the leader in both cases, from the current point of view, are control problems with the 
state equations being forward-backward stochastic differential equations (FBSDEs). The 
theories for nonlinear backward stochastic differential equations (BSDEs) and FBSDEs 
have been extensively studied over the last two decades following the initial work by 
Pardoux and Peng [23]. One can refer to, among others, [15], [16], [21], [26], [51] . and the 
references therein, for the development of the theory of FBSDEs and their applications. 
With the help of the results in optimization problems for controlled FBSDEs (see, e.g., 
[27] and [32]), we obtain the maximum principle for the leader's optimal strategies in 
stochastic global Stackelberg games, and discuss linear quadratic problems as well as the 
corresponding Riccati equations. 

This paper is organized as follows. In section 2 we formulate a stochastic Stackelberg 
game and give three types of concepts of equilibria. In section 3 we present the maximum 
principle for a stochastic open-loop Stackelberg game. In section 4 we focus on a stochas- 
tic closed-loop Stackelberg game and derive a maximum principle for the leader's optimal 
strategy. As examples, linear quadratic stochastic open-loop and closed-loop Stackelberg 
games are studied in section 5. For the open- loop linear quadratic case, we show the exis- 
tence and uniqueness of the solution to the associated stochastic Riccati equation under 
some assumptions. For the closed-loop case, we simply derive a new Riccati equation 
consisting of FBSDEs, without investigating the issue of the existence of its solution. 



2 Problem formulation and definition of equilibria 

Let (f2, J 7 , P) be a complete probability space on which is defined a d- dimensional standard 
Brownian motion {W(t),0 < t < T}. {J r t}o<t<T is the natural filtration generated by 
W and augmented by all the P-null sets in J 7 and V is the predictable sub-cx-field of 
B([0,T\)xF. 

We consider a stochastic differential system 

dx(t) = f(t, x(t), u(t), v(t))dt + a(t, x(t))dW(t), 
x(0) = x , 
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where 

/ : Q x [0, T] x R n x R mi x l™ 2 M n , 

a : ft x [0,T] x R n ->■ ]R nxd , 

are 7> x i3(M n+mi+m2 )/i3(M n ) and P x £(M n )/i3(M nxd ) measurable, respectively, and 
(u(-), ?;(•)) are the decision variables of the leader and the follower, respectively. The 
cost functionals for the leader and the follower to minimize are described as follows 

J x {u, v) = E[f 9l (t, x(t),u(t),v(t))dt + Gi(X(T))], 
Jo 

J 2 (u,v) = E[[ g 2 (t,x(t),u(t),v(t))dt + G 2 (x(T))}, 
Jo 

with 

g t : Q x [0, T] x R n x U x V R, 
d : x R n ->■ R, 

i = 1,2, being P x £(M n ) x B(t7) x and J" T x ,B(M n )/i3(M) measurable, 

respectively. 

According to the player's information sets at any given time, there are three types 
of Stackelberg games: (global) open-loop, (global) closed-loop, and feedback Stackelberg 
games. 

Open-loop games: In an open-loop Stackelberg game, the leader's information set 
at time t is {xq, Ft}- Therefore, the strategy u announced by the leader is an J^-adapted 
process. The follower aims at minimizing his cost functional J 2 (u,v) in accordance with 
the leader's strategy u on the whole duration of the game. His optimal response $(w) will 
be an adapted process such that 

J 2 (u, $(u)) < J 2 (u, v), Vtt, v. 

The leader, anticipating the follower's optimal response $, picks the policy u* which 
optimizes his performance index on the rational reaction curve of the follower, i.e., 

Ji («*,$(«*)) < Ji («,$(«)), Vti. 

(u*, $(w*)) is a Stackelberg solution for an open-loop game. 

Closed-loop games: In a closed-loop Stackelberg game, the information set for the 
leader at time t is {J-"t,x s , s G [0, t]} (closed-loop perfect state information). The strategy 
that the leader adopts now can incorporate the history information of the state. Since 
in general it is difficult for the follower to obtain his optimal response if the leader's 
announced strategy incorporates the whole history of the state, we only consider the 
closed-loop case under the memoryless perfect state information pattern, i.e., the infor- 
mation set of the leader at time t is {xq, x t , J-" t }. For leader's each strategy u(t, Xo,x), 
which is now a stochastic field, the follower tries to find his optimal response ty(u) such 
that 

J 2 (u, V(u)) < J 2 (u, v), Vtt, v. 
Taking into account the follower's optimal response, the leader should choose u* such that 

Ji(u*,*(u*) < Ji(u,*(u)), Vti. 
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(u*, is a Stackelberg solution for a closed- loop game. 

Feedback games: In a feedback Stackelberg game, the information set for the leader 
at time t is {x t , Ft} (feedback pattern). The significant mechanism difference between 
feedback games and the former two types of games is that the advantage of the leader 
over the follower in a feedback Stackelberg game is instantaneous not global, as the dif- 
ferential game could be viewed as the limit of the discrete-time game as the number 
of stages becomes unbounded (see [2]). Therefore, corresponding to the leader's instan- 
taneous strategy u(t,x), the follower will make an instantaneous response of the form 
v(t, x, u(t, x)), which depends on the current state and the leader's current action. A 
feedback solution is a pair of strategies (u*,v*) such that 

J x {u*y{u*)) < J x (u,v*{v)), Vw, 
J 2 (u* , v* (u*)) < J 2 (u*,v(u*)). V V. 

From the definition we can see that the feedback Stackelberg solution has some equilibrium 
feature, whereas the open-loop or closed-loop solution involves a sequential optimization 
at the level of the follower and the leader. 



3 Stochastic open-loop Stackelberg differential games 

We first introduce some notations. For two vectors x and y in MJ 1 , (x, y) means the inner 
product Y17=i x iVi- For a function / defined on IR n , Df or df means the gradient of /. 
Here we specify that throughout this paper all the vectors are column vectors and the 
gradient of a scalar function / is |£ = (Jf-, ■ • • , while the gradient of a vector 

function / = • • • , f m ) T is a matrix 



df_ 

dx 



/ 9h ... g/l \ 

dx\ dx„ 



\ dxi dx n / 



We further introduce two spaces of adapted processes to be used in the definition of the 
solution to a FBSDE, 

5 2 (0,T;R") := if) : Q x [0, T] — > R n is a continous adapted process such that 
E sup \^(t)\ 2 < oo}, 

0<t<T 

M 2 {0,T;R n ) := {ip\ if) : Q x [0,T] R n is an adapted process such that 
E [ \if)(t)\ 2 dt < oo}. 



.7 

And the above two spaces will be simply written as S 2 and Ai 2 , respectively, if no con- 
fusion arises. 

The admissible strategy spaces for the leader and the follower are denoted by 

U = {u\u : ft x [0, T] ->• U is ^-adapted and E \ \u{t)\ 2 dt < +oo}, 

Jo 

V = {v\v : ft x [0,T] -»■ V is J^-adapted and E / \v{t)\ 2 dt < +oo}, 

Jo 



5 



where U and V are subsets of M. mi and M.™ 12 . 

For the completeness of this paper, we state the formulation of general stochastic open- 
loop Stackelberg games and the corresponding maximum principle. From the definition 
in section [2j given the leader's strategy u <EU, the follower is faced the stochastic control 
problem 

min J 2 (u, v) = E[ g 2 (t, x(t),u(t),v(t))dt + G 2 (x(T))} 
veV Jo 

subject to 

' dx{t) = f(t, x{t),u{t),v{t))dt + a{t, x{t))dW{t), 
x(0) = x Q . 

Suppose there exists a unique solution v*(u(-)) G V to the above problem for each u ElA. 
If we define 

H 2 (t,x,u,v,p 2 ,q 2 ) ■= (p 2 J(t,x,u,v)} + (q 2 ,a(t,x)) + g 2 (t, x,u,v), 

then the maximum principle (see [33]) yields that there exists a pair of adapted processes 
(P2, <?2) e S 2 x M 2 such that 

' dx(t) =/(*, x(t),u(t),v*(t))dt + a(t, x(t))dW(t), 
-dp 2 (t) ={ (|^) T (t, x(t), u(t), v*(t))p 2 (t) + (|^) T (t, x{t))q 2 {t) 

+ ||(t, x(t), u(t),v*(t))}dt - q 2 (t)dW(t), (3.1) 

x(0) =x , p 2 (T) = -^r(x(T)), 
v*(t) =arg min H 2 (t, x(t), u(t), v,p 2 (t), q 2 {t)). 

We assume that by the last equation in ( 13. ip a function v = v*(t,x,u,p 2 ) is implicitly 
and uniquely defined. After substituting v = v*(t,x,u,p 2 ) into the follower's maximum 
principle, we get the control problem faced by the leader 



min J 1 (u)=E[ g 1 (t,x(t),u(t),v*(t,x(t),u(t),p 2 (t)))dt + G 1 (X(T))] 
ueu J 

subject to 

dx{t) =f(t,x(t),u(t),v*(t,x(t),u(t),p 2 (t)))dt + a(t,x(t))dW(t), 

-dp 2 {t) = {(-^) T (t,x(t),u(t),v*(t,x(t),u(t),p 2 (t)))p 2 (t) + (-£) T ( t ,x(t))q 2 (t) 

dq 2 , X (3.2) 

+ -^-{t, x(t),u{t),v*(t, x(t),u(t),p 2 (t)))}dt - q 2 (t)dW(t), 

x(0) =x , p 2 (T) = ~^{<T)). 

We denote 

H 1 (t,u,x,y,p 1 ,p 2 ,q u q 2 ) 
= (Pu f(t,x,u,v*(t,x,u,p 2 ))} + (q u a(t,x)) + g^t, x,u,v*(t, x } u } p 2 )) 



~ (V> (l^) 1 (t,x,u,v*(t,x,u,p 2 ))p 2 + (^-) T (t,x)q 2 + ^-(t,x,u,v*(t,x,u,p 2 ))). 
ox ox ox 



(3.3) 



Suppose u* is an optimal strategy for the leader. Then the maximum principle for con- 
trolled forward-backward stochastic differential equations (see, e.g., [27] or [32]) yields 
that there exists a triple of adapted processes (pi,qi,y) such that 



u*(t) = axgrnxn E 1 (t,u,x(t),y(t),p 1 (t),p 2 (t),q 1 (t),q 2 (t)), 



(3.4) 



and 



dy(t) 



dpx(t) 



dE x , dE x nxr/ , 



dp 2 



dq 2 



dfdv* T 



dv dp2 
d 2 g 2 dv* 



dx' 



i=i 



,dv* 



yd 



y + 



dxdv dp 2 
dE 

-—±dt + qidW(t) 
ox 

rdf_ + dfdv* 
dx dv dx 



dp 2 dv 



— > — 

dp 2 dv dxi ' 

^}dt-^-yd\V(1). 



P2 



,dv* T dgi, da 



dx' 



,do_y dgx 
y dx dx 



d_df_. 

dx dxi' 

d , da 
dx dxi 



+ 



,dv* . 
' dx ■ 



<9x 

d_df_ 

dv dxi 
d 2 g 2 dv* 



dv*^dg x 
dv 



:(^=-) T ]ft 



<9x 2 



"2/}dt+gidW(t), 



y(0)=0, Pl (T) 



dx 



2 Hx(T))y(T) + ^(x(T)). 



dx 



(3.5) 



4 Stochastic closed-loop Stackelberg games 

In this section, we consider a stochastic closed-loop Stackelberg game which is a stochastic 
version of the paper |20j. The difference between open-loop Stackelberg games and closed- 
loop Stackelberg games is that in the former case the leader's information set is the 
cr-field Tt generated by the Brownian motion W, whereas in the latter case the leader's 
information set involves both the cx-field J- t and the history of the state x. As stated in the 
introduction, the difficulty of studying closed-loop Stackelberg games arises from the fact 
that the reaction of the follower can not be determined explicitly if the leader's strategy 
depends on the whole history of the state (CLPS information structure). However, if the 
leader's strategy is restricted to be memoryless, i.e., only the current state is involved 
in the strategy, Papavassilopoulos and Cruz [20] provide an efficient way to solve such a 
problem. As demonstrated in [20] , the derivative |^ of the leader's strategy u will appear 
in the follower's adjoint equation and further in the leader's augmented state equation, 
which makes the leader's control problem a nonclassical one. 



4.1 The deterministic case revisited 

Since we apply the approach in Papavassilopoulos and Cruz [20J to solve the stochastic 
version of closed-loop Stackelberg games, we fist elaborate their techniques in this sub- 
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section. The state and the cost functionals for the leader and the follower are as follows 

(4.1) 



x(t) = f(t,x(t),u(t),v(t)), 
x(0) = Xq, 



Ji(u,v) 



J 2 (u,v) 



gx(t, x(t),u(t),v(t))dt + Gi(xt), 



g 2 {t,x(t),u(t),v(t))dt + G 2 (x T ). 



(4.2) 



Given the leader's strategy u(t, x) t e[o,T\ (we omit to write the dependence on the initial 
state Xq) which is continuously differentiable in x, if the follower's optimal response is 
v*, then according to the deterministic maximum principle, there exists a function p such 
that 

' x = f(t,x,u,v*), 



df_ + 9f_9u. T dg2 + ( duydg 2 
dx du dx dx dx du ' 



-V 

— + — =0 

dv dv 



(4.3) 



X(0) = Xq, p(T) 

Suppose we can get the unique solution 



dG 2 (x(T)) 



from solving 



dx 

V = (p(t, x,p, u) 

— + — = 

dv dv 



(4.4) 



Then, after substituting the expression ( I4.4p into (14.31) and J±, the leader will be faced 
with the following problem 



min Ji (u) 



gi(t, x, u, (p(t, x,p, u))dt + G\{xt) 



(4.5) 



subject to 



x = f(t,x,u,(p(t,x,p,u)), 

dg 2 
dx 

dG 2 {x{T)) 



-P 



df_ d£du T 
dx du dx 



,du T dg 2 
dx du' 



(4.6) 



X(0) = Xq, p{T) 



dx 



Since the derivative |^ of the control variable u is involved in the adjoint equation (14. 6p . 



dx 

the above problem is a nonclassical one. The authors provide two approaches to overcome 
this difficulty One is the direct application of variational techniques. The other one 
is more interesting, which reveals the relative independence of u and |^ and the time 
inconsistency property. To be more precise, with |^ replaced by another new control 
variable u, they construct a new classical problem 

cT 



min J\{u) 

u,u 



g 1 (t,x,u,(p(t,x,p,u))dt+ G 1 {x T ) 



(4.7) 



subject to 

x = f(t,x,u,if(t,x,p,u)), 
. ,df df T dg 2 ,^ T dg 2 
- p={ d- X + Yu U) P+ ^ + iu) ftT' (4-8) 

x(0) = x , p{T) = — , 

and prove the equivalence of the above nonclassical problem (I4.5p -( fl~6l) and the con- 
structed classical problem f|4.7p -( l4TB]) in the sense that they have the same optimal tra- 
jectory and costs. Indeed, if we denote by J* and the optimal values of problems 
(I4.5p -f j4~6l) and f l4.7p -f l4~8l) . respectively, then J* > J^. On the other hand, suppose that 
(u*, u*) is an optimal control for problem (I4.7p -( l4~8l) and x* is the corresponding trajectory, 
then control 

u(t, x) := u*{t)x + u*{t) - u*{t)x*{t) (4.9) 

yields the same trajectory x* and thus the same cost in problem (I4.5p - fl4.6l) . Consequently, 
J\ = J 2 an d u is an optimal control for the nonclassical problem fl4.5p - fl4.6p . Therefore, 
one can substitute |^ for u in the maximum principle for the problem fl4.7p -f H~5|) and 
finally get the maximum principle for the nonclassical problem f l4.5p -f H~oT) faced by the 
leader. 

Remark 4.1. Given the leader's strategy u(t, x)t e [o,T], the follower can also solve the 
following Hamilton- Jacobi-Bellman equation 

dV 2 dV 2 

~dt + vmn" ft^' f ^ x ' u ^ x) ' v ^ + 92 ^ x > u ^ x ^ v ^ = °' ( 4 io) 

V 2 (T,x) = G 2 (x), 

and obtain the optimal feedback strategy 

dV 

v*(t,x) = arg inf {(— f(t,x,u(t,x),v)) + g 2 (t, x, u(t, x), v)}. 
veR n ox 

However, since V 2 depends on the whole function «(■), it is impossible for the leader to 
employ dynamic programming to depict his optimal strategy. The maximum principle 
approach turns out to be more appropriate for closed-loop Stackelberg games. 



4.2 The stochastic case 

In this subsection we tackle closed- loop Stackelberg games in the stochastic context, with 
the same idea as [20J . After introducing a stochastic disturbance term in the state equation 
(14. ip . the adjoint equation for the follower, which also acts as the state equation in 
the leader's problem, will be a BSDE rather than an ODE with a terminal condition. 
Therefore, the leader will end up with a control problem in which the state equation 
consists of a SDE and a BSDE, with the feature that both the control u and its derivative 
|| are introduced in the controlled system. With the results on the maximum principle for 
control problems of FBSDEs, we present the necessary conditions for the leader's optimal 
strategy to satisfy in a closed-loop Stackelberg game. 
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We first introduce the admissible strategy spaces for the leader and the follower 

U := {u : u : Q x [0, T] x lR n — > U is J-" r adapted for any x G M n , u(t, x) is continuously 

du 

differentible in x for any (u,t) G Q x [0, T], and the derivative — is bounded}, 
V := {f : f : x [0, T] x M n -»■ V is ^-adapted for any x G W 1 }. 

Then, given the leader's strategy u(t,x), the follower's optimal response strategy v*(t,x) 
is a solution to the following classical optimal control problem, 

mm,J 2 = E[ g 2 (t,x{t),u(t,x(t)),v{t))dt + EG 2 {X(T)), (4.11) 
veV Jo 

subject to 

\dx(t) = f(t,x(t),u(t,x(t)),v(t))dt + a(t,x(t))dW(t), 

X . 



According to the maximum principle, there exists a pair of adapted processes (^2, 92) G 
S 2 x M 2 such that 

v*(t, x{t)) = argmin{{p 2 (t), f(t, x(t),u(t, x(t)),v)) + (q 2 , <r(t, x))+g 2 (t, x(t),u(t, x(t)), v)}, 

(4.13) 

and 

BC 

where x(-) is the solution of (14.121) with policies u(t,x) and v*(t, x). Suppose for any 
leader's strategy u(t,x), there exists a unique strategy v*(t,x) for the follower that mini- 
mizes his cost functional J 2 . We also suppose that (14. 13ft yields v* = <p(t,x,u,p 2 ). Then, 
taking into account the follower's optimal response, the leader will be confronted with the 
optimal control problem 



minJi = £ / g 1 (t,x(t) } u(t,x(t)),ip(t,x(t) } u(t,x(t)),p 2 (t)))dt + EG!(x(T)) (4.15) 
ueu J Q 

subject to 

dx{t) =f(t, x(t),u(t, x(t)), <p(t, x(t),u(t, x(t)),p 2 (t)))dt + a(t, x(t))dW(t), 

dp2(t) = ~ / + T P2 + ( 7r ) T Q2 

OX ou ox ox 
dg 2 du T dg 2 ( 4 - 16 ) 

BC 

x(0) =* , pa(T) = ^r«T)). 
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It can be seen that, after incorporating the follower's adjoint variable as an augmented 
state, the leader encounters a controlled FBSDE, which is the counterpart of ( 14. 6 p in the 
deterministic context. For the solvability of FBSDEs, one can refer to [13], [2E], [21], |31j . 
and the references therein. Here we assume that the leader's problem is well-posed, i.e., 
for each u(-) G U, there exists a unique triple (x,p 2 , q 2 ) G 5 2 x 5 2 x Ai 2 solving FBSDE 
(14. 161) . Since the derivative |^ of the control variable u is involved in the BSDE in (I4.16p . 
we apply the techniques in the deterministic case to relate the above nonclassical control 
problem to a classical one. 

Consider the optimization problem of a controlled FBSDE 

min J( Ul (.),u 2 (0) = E [ g 1 {t,x{t),u 1 {t),ip{t,x{t),u 1 {t),p 2 {t)))dt+EG 1 {x{T)), (4.17) 

U U U 2 J Q 

subject to 

dx(t) =f(t, x(t),ui(t), <p(t, x(t),u 1 (t),p 2 (t)))dt + a{t, x(t))dW(t), 

dq 2 -rdq 2 (4.18) 

x(0) =x Q , p 2 (T) = -g^«T)), 

where U\ and u 2 are adapted control variables with values in U and some bounded subset 
in M. miXn , respectively. Again we assume the above problem is well-posed. Obviously, if 
we denote by J* and J* the optimal values of problems ( 14.15[) -( |47T6l) and ( 14.1 7ft -( l4~T8j) . 
respectively, then J{ > J*. On the other hand, if (u*,u 2 ) is a solution to problem ( 14. 17ft - 
( 14.18P and x* is the corresponding optimal state trajectory, then we can construct an 
optimal control u* for problem (14. 151) - 04. 161) as follows 

u*(t, x) := u* 2 {t)x + u\{i) - u* 2 (t)x*(t). (4.19) 

Therefore, J* = J*, which implies that if u*(t,x) is a solution to problem (14.151) - (14 . 1 6 f) 
and x* is the corresponding optimal state trajectory, then (u*(t,x*(t)), ^(t, x*(t))) is an 
optimal control for problem (14 . 1 7p - (14 . 1 8 j) and leads to the same optimal state trajectory 
x*. Thus we can obtain the maximum principle for problem (I4.15j) -f l4.16p faced by the 
leader by means of the necessary conditions satisfied by the optimal control for problem 
fl4TTTj) - ff4TT8|) (see, e.g., [27] or p])- To this end, we define 

Hx{t, Mi, u 2 , x, y, p h p 2 , qi, q 2 ) 

9f df 

= (Pu f(t,x,u U Lp(t,x,u u p 2 ))) + (q u a(t,x)) - (y, (— + — u 2 ) T p 2 ^^0) 

.da T dg 2 . s T dg 2 . . . 
+ q2 + ~dx + ^ U2 ' ~d~> +gi{t,x,ui,(p(t,x,u 1 ,p 2 )). 

Theorem 4.1. Suppose u*(t,x) is a solution to the leader's problem (14 . 1 5 1) - ( 14 . 1 6 1) . Then 
there exists a triple (y,Pi,qi) such that 

(u*(t,x(t)),^(t,x(t))) (421) 
=arg {u \ u 2) min H x (t, u l , u 2 , x{t) : y{t),pi{t) : p 2 {t), q x {t), q 2 {t)) 
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and 



dy(t) = -^dt-^dW(t), 



< d Pl (t) 



dp 2 

dHi 



dx 

1/(0) = 0, Pl (T) 



dq 2 

dt + q!(t)dW(t) 

2 



d 2 G 



dx 



2 (x(T))y(T) + ^-(x(T)), 



(4.22) 



where (x,p2>(?2) is the solution of state equation (I4.16P with control u*(t,x), and jj^, 



and in ( 14.22f) are evaluated at 



du* 

(t, u*(t, x(t)), —(t, x(t)),x(t),y(t), Pl (t),p 2 (t), qi (t), q 2 (t)). 

Remark 4.2. If u is independent of x, we conclude in comparison with the arguments in 
section [3] that the closed-loop Stackelberg solution is reduced to the open-loop Stackelberg 
solution and the maximum principles for both cases are identical. 



5 The linear quadratic Stackelberg games 

In this section we consider linear quadratic open-loop and closed-loop Stackelberg games. 
Yong derives the Riccati equation for the open-loop Stackelberg game in [30] where the 
weighting matrices of the state and controls in the cost functionals are assumed not 
necessarily positive definite, and controls are allowed to appear in the diffusion term. For 
the follower's problem, the author uses the solutions of the follower's Riccati equation and 
a BSDE to give the state feedback representation of the follower's optimal strategy (one 
can also refer to [33, Page 313] for a similar derivation of the state feedback representation 
for a linear quadratic stochastic control problem with deterministic coefficients). To be 
precise, the author assumes that the follower's adjoint variable p 2 in (15.31) has the affine 
form 

p 2 = Px + (f). 

Applying Ito's formula to p 2 and taking into account (15. ip and (I5.3p . one can get the 
follower's Riccati equation with respect to P and a BSDE for </>. Then the author views 
the above BSDE for <ft, which contains the solution of the follower's Riccati equation and 
the leader's adopted strategy, and the original state equation as the leader's controlled 
system and further derives the leader's Riccati equation. Under some assumptions the 
author also discusses the solvability of the Riccati equations for the case of deterministic 
coefficients. Here we consider the follower's Hamiltonian system (15.41) as the leader's 
controlled state equation and hence the state feedback representation of the Stackelberg 
solution can be obtained at the same time for the leader and the follower. As a result, 
the corresponding Riccati equation here is of different form from the one in [30] • Since we 
deal with the case without decision variables in the diffusion term, we also show, under 
some appropriate assumptions, the existence and uniqueness of the solution to the derived 
Riccati equation with stochastic coefficients by means of a linear transformation to the 
standard stochastic Riccati equation. For the linear quadratic closed-loop Stackelberg 
game, we will see that the Hamiltonian system for the leader is no longer linear, which 
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prevents us from getting an exogenous Riccati equation if we proceed the same way as 
in the open-loop case. Instead, we assume that the forward variable y is linear with 
respect to the original state x and derive an exogenous FBSDE which plays the same 
role as the Riccati equation in open-loop case. Throughout this section we assume the 
coefficients A, Bi,C,Qi, Ri,Gi are adapted bounded matrices, Q^R^Gi are symmetric 
and nonnegative, and Ri are uniformly positive, i — 1,2. 

5.1 The open-loop case 

The state equation and cost functionals are given as follows. 



J dx(t) = (Ax + B\U + B 2 v)dt + CxdW{t), 
| x(0) = x , 

J^v) = ±E[ f T ({Q lX (t),x(t)) + (R lU (t),u(t)))dt+ (G 1 x(T),x(T))}, 



o 



1 



r 



(5.1) 



(5.2) 



J 2 (u,v) = -E[ I ((Q 2 x(t),x(t)) + (R 2 v(t),v(t)))dt + (G 2 x(T),x(T))]. 



o 



Given leader's strategy u e U, it is well known that the follower's problem 

min J 2 (u, v) = \e[ [ ({Q 2 x(t),x(t)) + (R 2 v(t),v(t)))dt + (G 2 x(T),x(T))\ 
subject to 

J dx(t) = (Ax + B x u + B 2 v)dt + CxdW{t), 
\ x(0) = x , 

is a standard linear quadratic optimal control problem and the unique solution is 

v*(t) = -R^Bjp,, 

where p 2 is the first part of the solution (p 2 , q 2 ) G S 2 x M 2 to the adjoint equation 

-dp 2 (t) = (A T p 2 + C T q 2 + Q 2 x)dt - q 2 dW(t), 
p 2 (T)=G 2 x(T). { ' ' 

Then, the leader's problem is 



1 



r 



min Jx(«) = -E[ / ((Q lX (t),x(t)) + (R lU (t),u(t)))dt + (G lX (T),x(T))] 
ueu z 



o 



subject to (the Hamiltonian system of the follower) 



dx(t) = (Ax + B x u - B 2 R 2 l Blp 2 )dt + CxdW(t), 
-dp 2 (t) = (A T p 2 + C T q 2 + Q 2 x)dt - q 2 dW(t), (5.4) 
x(0)=x o , p 2 (T) = G 2 x(T). 
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The leader's problem is well-posed since for every u 6 U, the coefficients of the system 
f !5.4p satisfy the monotonicity condition proposed by Peng and Wu [25], which yields the 
existence and uniqueness of the solution (x,p 2 ,q 2 ) to the system f)5.4p . Moreover, by 
similar arguments of Tang [29J, we can get the following estimate 

E sup \p 2 {t)\ 2 + E sup \x{t)\ 2 + E [ \q 2 {t)\ 2 dt<L{\x \ 2 + E [ \u(t)\ 2 dt), (5.5) 

0<t<T 0<t<T Jo JO 

where L is a positive constant. With this estimate, we can adopt relevant arguments 
for standard linear quadratic optimal control problems in [TT] and get the fact that the 
leader's objective functional J\(u) is convex in u, 

lim J\{u) = oo, 

\\u\\—>oo 

and J±(u) is Frechet differentiable over U with the representation 

(J[(u),w)=E [ ((Q 1 (t)x(t;x (h u),x(t;0,w)) + (Ri(t)u(t),w(t)))dt 



(5.6) 

+ (Gix(T; x , u), x(T; 0, w)). 

Here we use x(-;xq,u) to represent the solution of (15.41) with initial state x(0) = xo and 
control u. As a conclusion of Proposition 2.1.2 in [TT], we know that the leader has a 
unique optimal strategy u* GW which satisfies J[(u*) = 0. Now we use dual representation 
to characterize the optimal strategy u* . 

Theorem 5.1. For each u ElA, there exists a unique solution (x,y,pi,qi,p2,q 2 ) to the 
FBSDE 

dx{i) = {Ax + B lU - B 2 R 2 1 Bjp 2 )dt + CxdW(t), 
-dp 2 (t) = (A T p 2 + C T q 2 + Q 2 x)dt - q 2 dW(t), 

dy(t) = (Ay + B 2 R 2 x Blp x )dt + CydW(t), (5.7) 
-dpi(t) = [A T p 1 + C T qi - Q 2 y + Q x x)dt - qidW(t), 

x(0) = x , ?/(0) = 0, Pl (T) = -G 2 y(T) + G 1 x(T), p 2 {T) = G 2 x{T). 

The necessary and sufficient condition for u to be the leader's optimal strategy is 

u(t) = -R^B lPl (t). 

Proof. It can be seen that the FBSDEs consisting of (x,p 2 ,q 2 ) and (y,Pi,qi) are two 
decoupled systems. Therefore, for given u 6 U, we can first get the unique solution 
(x,p 2 ,q 2 ) to the equation 

dx(t) = (Ax + B x u - B 2 R 2 l B^p 2 )dt + CxdW(t), 
-dp 2 (t) = (A T p 2 + C T q 2 + Q 2 x)dt - q 2 dW(t), (5.8) 
x(0) = x , p 2 (T) = G 2 x(T). 



14 



Let y := —y. Then FBSDE consisting of (y,Pi,qi) in (15.71) can be converted into the 
following one 

dy(t) = (Ay - B 2 Rz 1 B]p 1 )dt + CydW(t), 
-dpi (t) = (A T p 1 + C T qi + Q 2 y + Q x x)dt - qi dW(t) , (5.9) 
2/(0) = 0, p 1 (T) = G 2 y(T) + G 1 x(T). 

The coefficients in the above system also satisfy the monotonicity condition in [26J. So 
there exists a unique solution to (15.91) . which also implies the existence and uniqueness of 
the solution (x,y,pi,qi,p 2 ,q 2 ) to FBSDE (15. 7L The necessary part comes directly from 
the maximum principle (13.41) and H 3 . 5 1) . Now we prove the sufficient part. Denote by 

(x(-; x , u),y(-; x , %o, u), q x (-] x , u),p 2 (-; x , u), q 2 (-; x , u)) 

and 

(x(-,0,w),y(-;0,w),pi(-,0,w),qi(-;0,w),p 2 (-;0,w),q 2 (-;0,w)) 

the solutions to the system of FBSDEs (I5.7P with initial states and controls as (xq, u) and 
(0,w), respectively. Using Ito's formula to compute 

(pi(t; x , u), x(t; 0, w)) + (p 2 (t; 0, w),y(t; x , u)) 

and taking the expectation, we can get 

{J[(u),w) =E(Gix(T; xo, u),x(T; 0, w)) 

+ E [ (Q^xfcxo,^^^-^,™)) + {R^uitf^itydt 



(5.10) 

=E / (R!(t)u(t) + Bj(t)p 1 (t;x ,u),w(t))dt. 
Jo 

Obviously u = —R^ l Bjpi makes J[{u) equal to zero, so it is an optimal strategy for the 
leader. □ 

From the uniqueness of the optimal strategy, we also know that FBSDE 

dx{t) = (Ax - B 1 R^ 1 Bjp 1 - B 2 R 2 l Blp 2 )dt + CxdW(t), 
-dp 2 (t) = (A T p 2 + C T q 2 + Q 2 x)dt - q 2 dW(t), 

dy(t) = (Ay + B 2 R 2 x Bl Vl )dt + CydW(t), (5.11) 
-d Pl {t) = (A T Pl + C T qi - Q 2 y + Q 1 x)dt - qi dW(t), 

x(0) = x , 2/(0) = 0, Pl (T) = -G 2 y(T) + G x x(T), p 2 (T) = G 2 x(T), 

has a unique solution (x,y,pi,qi,p 2 ,q 2 ). And the Stackelberg solution (u*,v*) can be 
written as 

u* = -R~ l Bjp u v* = -R 2 l Bjp 2 . (5.12) 

In what follows we see (x, y) as the state and derive the feedback representation of the 
Stackelberg solution (u*,v*) in terms of (x,y). We denote 
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and 



I , A \ A _ / B 1 R^ 1 BJ B 2 R- 2 x Bl \ * _ ( C 
4 =1 A J ' V ~ B ^ X BJ y 1 ' V° C 



Q2 o y ' " \g 2 o 

Then FBSDE (15.111) can be rewritten as 

r dx(t) = (Ax(t) - Bp(t))dt + CxdW(t), 

) dp(t) = -(A T p + C T q + Qx)dt + qdW(t), (5.13) 
[ x(0) = 0, p(T) = Gx(T). 

Suppose there is a matrix-valued process K such that 

p = Kx, (5.14) 

and K has a stochastic differential form 

dK{t) = M(t)dt + L(t)dW(t). (5.15) 

Applying Ito's formula to Kx, we get 

Mxdt + LxdW(t) + K(Ax - BKx(t))dt + KCxdW(t) + LCxdt 
=dp(t) (5.16) 
= - (A T Kx + C T q + Qx)dt + qdW(t). 

Comparing the diffusion terms in (15.161) . we have 

q = Lx + KCx. (5.17) 

Substituting the expression into (I5.16P and comparing the drift terms, we get 

Mx + K(Ax - BKx{t)) + LCx 
= - A T Kx - C T (Lx + KCx) - Qx, 

which yields 

M = -KA - A T K + KBK - LC - C T L - C T KC - Q. 

Therefore, we get the Riccati equation 

dK(i) = -{KA + A T K - KBK + LC + C T L + C T KC + Q)dt + LdW{t), 
K(T) = G. 



(5.18) 



(5.19) 



The difference between the above Riccati equation and the standard one from stochastic 
LQ problems without control in diffusion terms (see, e.g., [25]) is that B, Q and G here 
are not symmetric matrices. For n — 1 and under some appropriate assumptions on the 
coefficient matrices, we show in the following proposition that Riccati equation (I5.19P can 
be connected to a standard one through a linear transformation for FBSDE (15.131) . 
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Proposition 5.2. Suppose that n=l and a and ft are two positive constants such that 



Ql — —L — a B2R2 _ p 
Qi G% B\R l l Bj 



Then, the Riccati equation (15.191) has a unique solution. 
Proof. We make the transformation 

x = x, p = $p, q = $g, (5.20) 

where 

\2a 1 J 

Then FBSDE (15.131) can be converted into the following one 



(dx(t) = (Ax(t) - Bp(t))dt + CxdW{t), 
dp(t) = -(A T p + C T q + Qx)dt + qdW{t), (5.21) 
x(0) = 0, p(T) = Gx(T), 



where 



B 



A — A, C = C, 

B\R^ ^BiJ + I01B2R2 BJ — B2R2 BJ 
— B2R2 Bj 2/3.B2-R2 BJ 



Q 



Qi + 2(3Q 2 -Q2 



G 1 + 2f3G 2 -G 2 
-G 2 2aG 2 



4a(3 V -Qi 2aQ 2 J ' 
G = 

Now the matrices B, Q and G are symmetric and positive definite. Suppose 

p = Kx, 

and 

dK = K^t + LdWit). 

With the same procedure to derive Riccati equation (15.191) . we can get a standard Riccati 
equation for (K, L) 

{ dk(t) = -(kA + A T k-kl3k + LC + C T L + C T kC + Q)dt + LdW(t), 

\ ~ ~ (5.22) 

\ K(T) — G. 

According to the results in [8] or [25], or more general case in [29], we know that Riccati 
equation (I5.22p has a unique solution (k, L) and 

p = Kx, q = (L + KC)x. 
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Consequently, 



p = $p = &Kx = &Kx, 

_ „ _ „ _ _ (5.23) 

q = $g = $(L + KC)z = $(L + KC)x. 

Comparing (15.231) with (15. 141) and (15. 17ft . we finally get 

K = L = $L. 

From (I5.12p we obtain that the Stackelberg solution (u*, v*) has a feedback representation 
in terms of the state (x, y). □ 



5.2 The closed-loop case 

As pointed out in the deterministic case [20J, the relative independence of the leader's 
strategy u and its derivative |^ in a closed-loop Stackelberg game makes the leader so 
powerful that his Hamiltonian H is likely to achieve — oo if there is no restriction on the 
derivative One way to restrict the leader's strength is to add a penalty term |^ in 
his cost functional in order that H is convex with respect to (u, |^). The other way is to 
impose a prior bounds on |^ to retain H finite. In this section we will adopt the latter 
way to assume |^ to be bounded since it will appear as the coefficient of the unknowns 
in adjoint equations and the boundedness of the derivative |^ implies the well-posedness 
of the leader's problem when afhne strategies are adopted. For simplicity, we consider 
one-dimensional linear quadratic game, with the state equation and cost functionals of 
the two players as follows 

dx{t) = [Ax{t) + Bxuit) + B 2 v{t)]dt + Cx(t)dW(t), 
x(0) = x , 

and 

J x = )-E[ [ {Qix 2 (t) + R lU 2 {t))dt + Gix 2 (T)], 
2 Jo 

J 2 = \e\ [ (Q 2 x 2 (t) + R 2 v 2 (t))dt + G 2 x 2 (T)}. 
2 Jo 

The admissible strategy spaces from which the leader and the follower choose their strate- 
gies are given by 

U := {u\u : Q x [0, T] x R — > U is J-^-adapted for any iGR, u(t, x) is continuously 

du 

differentible in x for any (u,t) G O x [0, T], and the derivative | — | < K 

for some postive constant K}, 
V := {v\v : VI x [0, T] x R n V is ^-adapted for any x G R n }. 

Suppose for leader's each strategy u G U, the follower has a unique optimal response 
v* G V. From (14. 13j) we know 

v* = -R 2 l B 2 p 2 , 
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with p 2 satisfying 



dp 2 (t) = -[(A + B!—)p 2 + Cq 2 + Q 2 x}dt + q 2 dW(t), 



p 2 (T) = G 2 x(T). 
Therefore the leader's problem is 



min Jx = -E[ [ (Q 1 x\t) + R 1 u 2 (t))dt + G 1 x 2 (T)] (5.24) 
ueu 2 J 



subject to 



dx(t) = [Ax(t) + - R 2 B^p 2 (t)}dt + Cx(t)dW(t), 

dp 2 (t) = —[(A + B l —)p 2 + Cq 2 + Q 2 x]dt + q 2 dW(t), (5.25) 

x(0) = x Q , p 2 (T) = G 2 x(T). 

Suppose that for every u(t, x) G U, there is a unique solution (x,p 2 , q 2 ) to FBSDE (I5.25p . 
According to the discussions in section I4.2[ we know that the leader will lose nothing if 
he chooses his strategy among affine functions 

u(t, x) = u 2 (t)x + Ui(t), 

with -ui and u 2 being adapted processes and \u 2 \ < K. Then the leader's equivalent 
problem can be written as 

1 f T 

min Jt = -E{ / [Qix 2 {t) + Ri(u 2 {t)x{t) + wi(t)) 2 ]dt + Gix 2 {T)} (5.26) 

U\,U2 2 Jq 

subject to 

dx(t) = [(A + Biu 2 )x + Biui - R^ l Blp 2 ]dt + Cx(t)dW(t), 
dp 2 {t) = -[(A + B x u 2 )p 2 + Cq 2 + Q 2 x}dt + q 2 dW(t), (5.27) 
x(0) = x , p 2 (T) = G 2 x{T). 

For every pair (ui,u 2 ), the monotonicity condition guarantees the existence and unique- 
ness of the solution to (15.271) . Therefore, the leader's problem with strategies restricted 
being of affine form is well-posed. In what follows we use the maximum principle to get 
the Hamiltonian system and related Riccati equation for leader's problem (I5.26I) - (I5.27I) . 
Denote 

Hi(t,u 1 ,u 2 ,x,y,p l ,p 2 ,qi,q 2 ) 
=pi {(A + B x u 2 )x + Biui - R^ x B 2 2 p 2 \ + Cxq x (5.28) 

- y[(A + B x u 2 )p 2 + Cq 2 + Q 2 x] + ^[QiX 2 + R x {u 2 x + u x ) 2 }. 

To obtain (u^u^) that minimizes H 1 (t,Ui,u 2 ,x,y,pi,p 2 ,qi,q 2 ), we first fix u 2 and mini- 
mize Hi with respect to u\. By computation, 

u\ = -u 2 x - R^Bxpx. (5.29) 
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Substituting f!5.29|) into the expression f!5.28|) of H, we can see the only term containing 
u 2 is 

- B 1 yp 2 u 2 . (5.30) 

Therefore, the optimal u 2 is 

( -K, if A > 0, 
u% = I K, if A < 0, (5.31) 
y undefined, if A = 0, 

where 

A := -B 1 yp 2 . 

To find a candidate of optimal pair (u{,u 2 ), we set 

u 2 :=bang(K, —K; A) 
: =sgn(B 1 yp 2 ) K 
=sgn(y)sgn(B 1 p 2 )K 
=sgn(p 2 )sgn(B 1 y)K, 

where sgn is the sign function defined by 

( 1 if x > 0, 
sgn(x) = < if x = 0, 
[ -1 if x < 0. 

From (I5.29P we get 

u \ = -bang(K, -K; A)x - Ri l B lPl . (5.32) 

If (w*, u 2 ) € U x V is a solution to the leader's problem (" 15 . 2 6 [) - fl 5 . 2 7 j) . then the maximum 
principle yields that there exist adapted processes y, pi, and q\ such that 

dx(t) = [(A + B lU * 2 )x + B\u\ - R 2 l B 2 2 p 2 }dt + Cx(t)dW(t), 
dy(t) = [(A + B lU * 2 )y + R^Blp^dt + CydW(t), 

= -[(A + B 1 u* 2 )p 1 + C qi - Q 2 y + Qix + R lU * 2 (u* 2 x + u\)]dt + qtdW(t), 
dp 2 (t) = -[(A + B lU * 2 )p 2 + Cq 2 + Q 2 x]dt + q 2 dW(t), 
x(0) = x , j/(0) = 0, Pl (T) = -G 2 y(T) + G x x(T), p 2 (T) = G 2 x(T), 
= —bang(K, —K; A)x — R^Bipi, u 2 := bang(K, —K; A). 

Like the open-loop case, we proceed to express the optimal strategy {u\,u 2 ) in a non- 
anticipating way by means of the state feedback representation. Substituting the expres- 
sions of u\ and u 2 into the FBSDE in (15.331) . we get 

dx(t) = [Ax - R^ l Bf Pl - R 2 x B\p 2 \dt + Cx(t)dW{t), 
dy(t) = [(A + B x bang{K, -K; A))y + R^B^dt + CydWit) 

= [Ay + sgn(p 2 )K\B iy \ + R^ 1 B^p 1 \dt + CydW{t), 
dpi(t) = ~[A Pl + Cq x - Q 2 y + Q x x\dt + q 1 dW(t), (5.34) 
dp 2 {t) = -[(A + B x bang{K, -K- A))p 2 + Cq 2 + Q 2 x]dt + q 2 dW{t) 
= -[Ap 2 + sgn(y)K\B lP2 \ + Cq 2 + Q 2 x]dt + q 2 dW(t), 
x(0) = x , 2/(0) = 0, Pl (T) = -G 2 y(T) + G 1 x(T), p 2 {T) = G 2 x{T). 
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In contrast to FBSDE (15. lip in the open-loop case, the presence of the additional nonlinear 
term bang(K, —K\ A) in FBSDE (I5.34p makes it a nonlinear system. Moreover, the 
Lipschitz continuity assumption usually made for the coefficients in the literature does 
not hold here. Therefore, the existence and uniqueness of the solution to (I5.34p . as far as 
we know, is still not available. On the other hand, if we still view (x, y) as the "state" 
and represent (pi,p 2 ) in terms of (x,y) as in the open-loop case, we can not derive an 
exogenous Riccati equation. Instead, we only SGG X clS the state and suppose 

y(t) = £{t)x{t), Pl {t)= V {t)x{t), p 2 (t) =C(t)x(t), (5.35) 

and 

d£(t) = ti{t)dt + Z 2 {t)dW(t), 

dr)(t) = rjttydt + rj 2 (t)dW(t), (5.36) 
dt(t) = ( 1 (t)dt + £ 2 (t)dW(t). 

By Ito's formula and in view of (15.351) 

dy{t) =Ot)dx{t) + x{t)d£{t) + Cx(t)£ 2 (t)dt 

=£(t)[Ax - Ri l B 2 lPl - R 2 l B 2 2 p 2 }dt + C£(t)x(t)dW(t) 

+ ^(t)x(t)dt + £ 2 {t)x{t)dW{t) + C£ 2 {t)x{t)dt (5.37) 
={[A- R?B 2 lV (t) - R 2 l B\am(t) 

+ + C£ 2 {t)}x{t)dt + [C£(t) + £ 2 {t)]x{t)dW{t). 

On the other hand, 

dy(t) =p + B x bang{K, -K; A))y + R 2 l B 2 2Pl ]dt + CydW(t) 

=p + B x bang{K, -K; A))f (t) + R 2 * B%r](t)}x(t)dt + C£(t)x(t)dW(t), 

where 

A := -B i at)C(t). 
Comparing (15.371) and (15.381) . we have 

Ut) =0, 

£i(t) =[Ri l Bh(t) + R^BlCit) + B 1 bang(K, -K; A)]£(t) + R 2 l B 2 2 r,(t). 
With the same procedure, we can get 

Vl (t) ^[R^Blrjit) + R 2 x BlC{t) -2A- C 2 ] V (t) + Q 2 £(t) - 2C m (t) - Q u 
Ci(t) =[R^B 2 lV (t) + R 2 l B 2 2 C,{t) -2A-C 2 - B x bang(K, —K; A)]((t) - 2CC 2 (t) - Q 2 . 
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Therefore, we derive the related Riccati equation for problem (I5.26p - fl5.27p 

' d£(t) ={[R^B 2 lV (t) + R^BlCit) + B x bang{K, —K; A)]£(t) + R^B^t^dt 
={[R^B 2 lV (t) + R^BlcmH) + sgn(C(t))\B^(t)\ + R^B 2 v (t)}dt, 
d V (t) ={[R^B 2 lV (t) + R 2 l B 2 2 C{t) -2A- C 2 } V (t) + Q 2 at) - 2C m (t) 
-Q 1 }dt + r ]2 {t)dW{t), 
< d((t) ={[R^B 2 r](t) + R~ l BlC{t) -2A-C 2 - B x bang{K, —K; A)]C(t) 

- 2C( 2 (t) - Q 2 }dt + ( 2 (t)dW(t) 

={[R^B 2 ir] (t) + R 2 x BlC{t) -2A- C 2 ]C(t) - sgn(S(t))\BiC(t)\ 

- 2C( 2 {t) - Q 2 }dt + ( 2 (t)dW(t), 

k m =o, V (T) = -G 2 i{T) + g u con = g 2 . 

Suppose (£, i], C, rj 2 , ( 2 ) is a solution to the above FBSDE and x* solves the linear SDE 

dx(t) = [A- R^Blri - R^BlQx^dt + Cx(t)dW(t), 
x(0) = Xq. 

Then we can use Ito's formula to verify that 

y{t) :=at)x*(t), Pl (t) := V (t)x*(t), p 2 (t) := ((t)x*(t), 
qi (t) :=[C V (t)+ V2 (t)}x*(t), q 2 (t) := [C((t) + ( 2 (t)}x*(t), 

together with x* solve the leader's Hamiltonian system fl5.34p . Therefore, 

u(t, x) = bang(K, -K; A)x - bang{K, -K; A)x*{t) - R^ 1 B^^x* (t) 

with A = —Bi^(t)((t) is a candidate of the leader's optimal strategy. 
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